Unsupervised Labeling of Data for Supervised Learning and its Application to Medical claims Prediction

نویسندگان

  • Che Ngufor
  • Janusz Wojtusiak
چکیده

The task identifying changes and irregularities in medical insurance claim payments is a difficult process of which the traditional practice involves querying historical claims databases and flagging potential claims as normal or abnormal. Because what is considered as normal payment is usually unknown and may change over time, abnormal payments often pass undetected; only to be discovered when the payment period has passed. This paper presents the problem of on-line unsupervised learning from data streams when the distribution that generates the data changes or drifts over time. Automated algorithms for detecting drifting concepts in a probability distribution of the data are presented. The idea behind the presented drift detection methods is to transform the distribution of the data within a sliding window into a more convenient distribution. Then, a test statistics p-value at a given significance level can be used to infer the drift rate, adjust the window size and decide on the status of the drift. The detected concepts drifts are used to label the data, for subsequent learning of classification models by a supervised learner. The algorithms were tested on several synthetic and real medical claims data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

طبقه بندی و شناسایی رخساره‌های زمین‌شناسی با استفاده از داده‌های لرزه نگاری و شبکه‌های عصبی رقابتی

Geological facies interpretation is essential for reservoir studying. The method of classification and identification seismic traces is a powerful approach for geological facies classification and distinction. Use of neural networks as classifiers is increasing in different sciences like seismic. They are computer efficient and ideal for patterns identification. They can simply learn new algori...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Unsupervised model compression for multilayer bootstrap networks

Recently, multilayer bootstrap network (MBN) has demonstrated promising performance in unsupervised dimensionality reduction. It can learn compact representations in standard data sets, i.e. MNIST and RCV1. However, as a bootstrap method, the prediction complexity of MBN is high. In this paper, we propose an unsupervised model compression framework for this general problem of unsupervised boots...

متن کامل

Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning

An inescapable bottleneck with learning from large data sets is the high cost of labeling training data. Unsupervised learning methods have promised to lower the cost of tagging by leveraging notions of similarity among data points to assign tags. However, unsupervised and semi-supervised learning techniques often provide poor results due to errors in estimation. We look at methods that guide t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Science (AGH)

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2013